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Background 

The Longitudinal Surveys of Australian Youth (LSAY) is a research program that tracks 
young people as they move from school into further study, work and other destinations. It 
uses large, nationally representative samples of young people to collect information about 
education and training, work, and social development. 

It includes surveys conducted from the mid-1970s through to the mid-1990s: the Youth in 
Transition Survey (YITS); the Australian Longitudinal Survey (ALS); the Australian Youth 
Survey (AYS); and the current LSAY collection, which began in 1995. 

Survey participants (collectively known as a ‘cohort’) enter the study when they turn 15 
years, or as was the case in earlier studies, when they were in Year 9. Individuals are 
contacted once a year for up to 12 years. Studies began in 1995 (Y95 cohort), 1998 (Y98 
cohort), 2003 (Y03 cohort) and more recently in 2006 (Y06 cohort). 

Since 2003, the initial survey wave has been integrated with the Organisation for Economic 
Co-operation and Development (OECD) Programme for International Student Assessment 
(PISA). Over 10 000 students start out in each cohort. 

The LSAY research program provides a rich source of information to enable a better 
understanding of young people and their transitions from school to post-school destinations; it 
also explores their social outcomes, such as wellbeing. 

Information collected as part of the LSAY program covers a wide range of school and post-school 
topics, including: student achievement, student aspirations, school retention, social background, 
attitudes to school, work experiences and what students are doing when they leave school. 

LSAY is managed and funded by the Australian Government Department of Education, 
Employment and Workplace Relations (DEEWR), with support from state and territory 
governments. On 1 July 2007, the National Centre for Vocational Education Research (NCVER) 
was contracted to provide analytical and reporting services for the following three years for 
LSAY. NCVER is undertaking this service for the department in collaboration with the Australian 
National University’s Social Policy Evaluation, Analysis and Research Centre (SPEAR). 

Between 1995 and 2007 the LSAY analytical and reporting services were provided by the 
Australian Council for Educational Research (ACER) jointly with the Department of 
Education, Science and Training 1 (DEST). 

More information can be obtained from the LSAY website: <www.lsay.edu.au> or by 
contacting NCVER: 

Toll free: 1800 825 233 
Ph: +61 8 8230 8400 

Fax: +61 8 8212 3436 

Email: <lsay@ncver.edu.au> 

Website: <www.lsay.edu.au> 



1 Replaced in December 2007 by the Department of Education, Employment and Workplace Relations. 
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Using this guide 

This User guide has been developed for users of the LSAY data. The guide endeavours to 
consolidate existing technical documentation and other relevant information into one single 
document, thereby improving data accessibility and promoting wider use of the LSAY data. 

To promote effective use of the data, the guide aims to address all aspects of LSAY data, 
including information about: how to access the data, data restrictions, variable-naming 
conventions, the structure of the data documentation (using topic areas, topic maps and data 
elements), classifications and code frames used, weights and derived variables. 

A series of additional documents ( Data elements A to Data elements D ) supplement this User- 
guide. Data elements represent variables that are common within and across waves. These 
documents contain information about the data elements, including the variables they cover, the 
valid values (or response options) for each variable and additional notes (where applicable). 
The section in this publication ‘The LSAY data: Data elements’, contains further information 
about data elements. 

Users may also find the following supporting documents useful: 

• Meta-data workbook — provides a listing of variables in the Y98 data set, as well as basic 
information about each variable. Data can be filtered and inspected by wave/year, 
questionnaire section, topic area(s) and/or data element. See ‘The LSAY data: Variable 
listing/meta-data workbook’ for further information. 

• Variable concordance — maps old to new variable names. The section ‘Variable-naming 
conventions: Historical variable names’ contains additional information. 

These documents should be used in conjunction with this User guide as required and can all 
be accessed at the same URL: <www.lsay.edu.au/publications/2199.html>. 

This is the first version of the User guide and feedback is therefore welcome. In addition, if 
you have any problems finding the information you need or understanding the information 
contained, please do not hesitate to contact the LSAY branch at NCVER: 
<lsay@ncver.edu.au>. 
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The Y98 cohort 

In 1998 a nationally representative sample of about 14 000 Year 9 students was selected to 
form the second cohort of the LSAY program. The sample was constructed by randomly 
selecting two Year 9 classes from a national sample of 300 schools intended to represent all 
states and education sectors. This is referred to as the LSAY Y98 cohort. 

Reading and numeracy tests were administered to students to provide information on school 
achievement. Students also completed a background questionnaire about their educational and 
vocational plans and attitudes to school. In 1999, these students provided information in 
response to a mailed questionnaire. Information was also obtained from their schools about 
curricula and school organisation. 

In 1999, members of the sample were contacted in the first of the annual telephone interviews 
(conducted by Reark Research, then AC Nielsen). The questionnaire included questions on 
school, transitions from school, post-school education and training, work, job history, job 
search history, non-labour force activities, health, living arrangements and finance, and 
general attitudes. Subsequent surveys (conducted by the Wallis Consulting Group) asked 
similar questions but with the emphasis changing from school to post-school education, 
training and work. 

Due to both population shifts over time and survey attrition, care needs to be taken when 
comparing individual waves of the cohort against other samples drawn from different 
populations. For example, it can be misleading to compare the LSAY Y98 wave 3 (2000) 
indicators against 18-year-olds from other surveys in the same year. 

Prior to the development of this User guide, a range of documents contained information 
about the Y98 cohort. These documents were categorised as codebooks, cohort reports, 
technical papers and research reports. 

All Y98 LSAY technical documents can be accessed at: <www.lsay.edu.au/data/21070.html>. 

Codebooks 

The Y98 codebooks provide a series of frequency tables for each variable as well as the 
questionnaire for that survey year. 

LSAY codebooks can be accessed at <www.lsay.edu.au/data/21070.html>. Table 1 provides a 
summary of the available codebooks. 
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Table 1 Technical documents — codebooks 



Wave/year 


Technical report/paper 


Wave 1/1998 


Technical report no. 20 


Wave 2/ 1999 


Technical paper no. 22 


Wave 3/ 2000 


Technical report no. 24 


Wave 4/ 2001 


Technical report no. 26 


Wave 5/ 2002 


Technical report no. 29 


Wave 6/ 2003 


Technical report no. 30 


Wave 7/ 2004 


Technical report no. 32 


Wave 8/ 2005 


Technical report no. 37 


Wave 9/ 2006 


Technical report no. 39 


Wave 10/2007 


Technical report no. 44 


Wave 11 /2008 


Technical report no. 51 



Cohort reports 

The Y98 cohort reports summarise the activities of a group of young Australians who were in 
Year 9 in 1998 and at an average age of 15 years, through to the final wave of interviewing in 
2008 when they were, on average, 25 years of age. 

The content of the cohort reports focuses on the areas of educational attainment, employment, 
measures of engagement in study and work, and social outcomes. The cohort reports present a 
series of tables for each of the indicators. Each series of tables can be filtered by a range of 
demographic variables and downloaded into Excel. 

The Y98 cohort reports can be accessed at: <www.lsay.edu.au/cohort/introduction.html>. 

Previous cohort reports focused on describing the education, employment and social 
participation of young people during the year, and the experiences and attainment in these 
domains up to a point in time. 

These previous reports are available in PDF format and can be accessed at: 
<www.lsay.edu.au/cohort/other_search.html>. 

Table 2 provides a summary of the earlier reports. 
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Table 2 Technical documents — old cohort reports 



Wave/year Technical report/paper 



Wave 2/1 999 
Wave 3/2000 
Wave 4/2001 
Wave 5/2002 
Wave 6/2003 
Wave 7/2004 
Wave 8/2005 



The Year 9 class of 1998 in 1999: Activities and aspirations 

The Year 9 class of 1998 in 2000: School and non-school experiences 

The Year 9 class of 1998 in 2001 : Education, employment and interests 

The Year 9 class of 1998 in 2002 

The Year 9 class of 1998 in 2003 

The Year 9 class of 1998 in 2004 

The 1998 LSAY Year 9 cohort report: 21 year-olds in 2005 



Other technical papers 

Other technical papers that may be useful include sampling methodology, weighting 
methodology, and how socioeconomic status is measured. 

Table 3 provides a summary of the existing technical papers/reports for the Y98 cohort. 

Technical paper number 14 can be accessed at: <www.lsay.edu.au/data/31275.html>. 

Technical papers numbers 16 and 48 can be accessed at: 
<www.lsay.edu.au/data/31273.html>. 

Table 3 Technical documents — other technical papers 
Technical report/paper Title 

Technical paper no. 14 The measurement of socioeconomic status and social class in the LSAY project 

Technical paper no. 16 The designed and achieved sample of the 1998 LSAY sample 

Technical paper no. 48 Estimating attrition bias in the Year 9 cohorts of the Longitudinal Surveys of Australian Youth 
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Accessing the data 

LSAY data sets are deposited annually with the Australian Social Science Data Archives 
(ASSDA) at the Australian National University in Canberra. Permission to use the data and 
access requirements are managed by ASSDA. Data access requires authorisation from the 
Data Archive Manager. 

The data can be accessed by: 

• Contacting the Australian Social Science Data Archives (details below) and requesting 
the ‘LSAY application to access restricted data’ and ‘Undertaking’ forms 

• Completing the ‘LSAY application to access restricted data’ form 

• Completing the ‘Undertaking’ form 

• Returning the completed forms to the Australian Social Science Data Archives. 

For those interested in more historical data, the current LSAY program has been built on the 
following two surveys conducted by the Australian Council for Educational Research: 

• Youth in Transitions (YIT) — from 1978 to 1996 

• Australian Youth Survey (AYS) — from 1989 to 1997. 

Both these data sets form part of the LSAY suite and are retained at the Australian Social 
Science Data Archives, where they are available for use by researchers. 

Part of NCVER’s role is to promote and encourage the use of the LSAY data. If you have any 
feedback or queries about the data and how to access it you should contact: 

NCVER 

e-mail: <lsayrequests@ncver.edu.au> 

LSAY hotline: 1800 825 233 

Australian Social Science Data Archives 
e-mail: <assda@anu.edu.au> 

phone: 02 6125 4400 

fax: 02 6125 0627 

Specific data requests 

A specific data request allows you to request specific tables and/or data analysis to be 
undertaken by NCVER without having to obtain full sets of the data. 

A specific data request can be made to <lsayrequests@ncver.edu.au>. 

There are fees and charges applicable for all data requests that require more than one 
hour to prepare. Please refer to NCVER’s policy on charging: <www.ncver.edu.au/ 
aboutncver/statistics/data.html>. 

LSAY data releases 

Information about the latest LSAY data releases is available from the LSAY website: 
<www.lsay.edu.au/data/latest.html>. 

Y ou may also request to be notified of recent LSAY releases, which include publications and 
data releases. You can subscribe to NCVER’s LSAY alert page at <www.lsay.edu.au/ 
newsevents/subscribe.html>. 
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Data restrictions 

Data use is restricted to research; data are not to be used for commercial or financial gain. In 
addition, LSAY information by state and school sector cannot be accessed in combination 
with the achievement information. This reflects permission requirements agreed at the time 
the data were collected. LSAY data sets therefore contain either state/sector information or 
achievement data. 

LSAY data users must also agree to refrain from matching the state/sector with school 
achievement information. 

Conditions of use are outlined in the form, Longitudinal Surveys of Australian Youth 
Undertaking, which is available on request from ASSDA by email at: <assda@anu.edu.au>. 

These conditions of use are as follows: 

1. Use of the material is restricted to use for statistical purposes. This means the user can only 
use the material to produce information of a statistical nature. Examples of such uses are: 

a. the manipulation of data to produce means, correlations or other descriptive summary 
measures 

b. the estimation of population characteristics from sample data 

c. the use of data as input to mathematical models and for other types of analyses (for 
example, factor analysis) 

d. the provision of graphical and pictorial representation of characteristics of the 
population or sub-sets of the population. 

2. The material is not to be used for any non-statistical purposes, or for commercial or 
financial gain without the express written permission of the Data Archive Manager. 

Examples of non-statistical purposes are: 

a. transmitting or allowing access to the data in part or whole to any other 
person/department/organisation not a party to this undertaking 

b. attempting to match unit record data in whole or in part with any other information 
for the purposes of attempting to identify individuals. 

3. Statistical tables, graphs etc. obtained from analysis of these data may be further 
disseminated provided that the user: 

a. acknowledges both the original depositors and the Australian Social Science Data Archive; 

b. acknowledges another archive where the data file is made available through the 
Australian Social Science Data Archive by another archive 

c. declares that those who carried out the original analysis and collection of the data 
bear no responsibility for the further analysis or interpretation of it. 

4. Use of the material is solely at the user’s risk and the user must indemnify the Australian 
National University and the Australian Social Science Data Archive. 

5. The Australian National University and the Australian Social Science Data Archive are 
not held responsible for the accuracy and completeness of the material supplied. 

6. Where applicable: 

a. The user must draw the terms and conditions of the undertaking to the attention of 
persons within the department/organisation who shall make use of the material. 

b. The Australian National University and the Australian Social Science Data Archive 
are not to be held liable for any breach of this undertaking. 

7. LSAY state/sector information cannot be matched with the LSAY student achievement 
information. For this reason, these data are only available in separate files. 
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Overview of the LSAY questionnaires 

In the first survey wave, reading and numeracy tests were administered to students in their 
schools to provide information on school achievement. Further information on literacy and 
numeracy scoring can be found below. 

Students also completed a background questionnaire about their educational and vocational 
plans and attitudes to school; the questionnaire also collected information on the students 
themselves, their family, and institutional factors, which can help explain performance 
differences. 

The longitudinal nature of the LSAY data collections means that new surveys are closely 
linked to, are comparable with, and build on, the previous surveys. 

Following the collection of written information in the first two years, students are contacted 
annually by telephone and asked a range of questions across the following sections: 

• Section A: School 

• Section B: Transition from school 

• Section C: Post-school study 

• Section D: Work 

• Section E: Job history 

• Section F: Job search activity 

• Section G: Not in the labour force 

• Section H: Living arrangements, finance and health 

• Section J: General attitudes 

The focus of the questionnaires changes as the cohort ages, from a school and study focus when 
they are younger, to more of an employment focus in later years. For instance, Sections A and B 
were no longer asked from wave 8 for the Y98 cohort and Sections E to J were only asked from 
wave 3. Section D was the only section to be asked in every wave of the Y98 cohort. 

The Y98 questionnaires are contained in the series of Y98 codebooks. LSAY codebooks can 
be accessed at: <www.lsay.edu.au/data/31273.html>. Table 1 provides a summaiy of the 
available codebooks. 

Year 9 achievement in literacy and numeracy 2 

Students were asked to complete two tests on literacy and numeracy when they were first 
contacted in 1995. From their answers in these two tests three measures were constructed: 
achievement in literacy in Year 9, achievement in numeracy in Year 9, and combined 
achievement in literacy and numeracy in Year 9. 

The measure of literacy is the students’ raw scores on the literacy test, and could range from 
0 to 20. The literacy test comprised 20 items. Students were asked to read some text and then 



2 This information has been sourced from GN Marks, J McMillan and K Hillman, Tertiaiy entrance 
performance: The role of student background and school factors, LSAY research report, no. 22, ACER, 
Camberwell, Vic. 2001. Available online at: <http://www.lsay.edu.au/publications/1869.html>. 
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asked several questions about the text. The text comprised short newspaper articles and longer 
textual passages. 

The measure of numeracy is the students’ raw score on the numeracy test. Scores could range 
from 0 to 20. The numeracy test comprised 20 questions. Three broad types of questions were 
asked. The first type dealt with mathematical operations (mainly computations) with little or 
no practical component. This included simple operations such as addition and subtraction, and 
more complex operations such as long division, fractions, squares, cubes, and square roots. 
The second type of question required practical applications of numerical skills. Examples are 
questions about buying things, reading scales, tables, and graphs, and calculating interest. The 
third type of question required the application of abstract mathematical concepts. These were 
mainly logical and spatial problems. 

The combined measure of achievement in literacy and numeracy represents an overall 
measure of early school achievement. The scores for the literacy and numeracy tests were 
centred about the means and summed to produce a combined measure of achievement. The 
combined measure was then standardised to a mean of zero and a standard deviation of one. 

This measure was used in correlational and regression analyses. For the presentation of means 
and box and whisker plots, the continuous measure was split into four categories, based upon 
quartiles of achievement (that is, the highest quartile represents the top 25% of students, the 
next quartile represents the next 25% of students, and so forth). 
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The LSAY data 



The LSAY data sets are large and particularly complex. More than 400 variables are 
collected (on average) across each wave, culminating in close to 5000 variables across the 
entire data set. To improve accessibility of the LSAY datasets, data have been grouped into 
common themes called ‘topic areas’. 

Topic areas 

The topic areas are comprised of four hierarchical levels: 

• Major topic areas are the broadest topic area. There are four major topic areas. 

• Sub-major topic areas are subdivisions of the major topic areas. There are 1 1 sub-major 
topic areas. 

• Minor topic areas are subdivisions of the sub-major topic areas. There are 71 sub-major 
topic areas. 

• Data elements are subdivisions of the minor topic areas. There are about 800 data elements. 

The four major topic areas include Demographics, Education, Employment and Social. The 
divisions of these major topic areas into sub-major topic areas and minor topic areas are 
illustrated in figures 1 to 4. 



Figure 1 Major topic area 1 — Demographics 
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Figure 2 Major topic area 2 — Education 
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School transition 
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School characteristics 




Plan to leave school 




Study 


Student characteristics 




Post-school plans 




Current study 


Student achievement 




Careers advice 




Past study 


Perceptions about self and school 




School leavers 




Apprenticeships /traineeships 


Workplace learning 




Cu r rent app renticeships /t raineeships 


Subjects /courses: General 




Past apprenticeships /traineeships 


Subjects /courses: TAFE/VET 




Defer red /withdrew from study 


Qualifications and results 




Changed institutions 


Government payments 




Changed course 



Changed /stopped employer 



Changed / stopped app renticeship /t raineeship 
Satisfaction with study 
Careers advice 
Qualifications completed 
Government payments and income 



Figure 3 Major topic area 3 — Employment 




Figure 4 Major topic area 4 — Social 




Topic maps 

Topic maps have been developed for each of the 1 1 sub-major topic areas. The topic maps 
aim to improve accessibility of the LSAY data by linking questions (or variables) common 
within and across waves. These common variables are identified as data elements. 
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Topic maps by sub-major topic area can be found in the ‘Topic maps’ section of this User 
guide. A summary of the topic maps appears in table 4. 


Table 4 


Topic maps 




Topic map 


Major topic area 


Sub-major topic area 


1 


Demographics 


Student 


2 


Demographics 


Parent 


3 


Education 


School 


4 


Education 


School transition 


5 


Education 


Post-school 


6 


Employment 


Current 


7 


Employment 


Job history and training 


8 


Employment 


Seeking employment 


9 


Employment 


Not in the labour force 


10 


Social 


Health, living arrangements and finance 


11 


Social 


General attitudes 


Data elements 




Data elements represent variables that are common within and across waves. Information 
about each data element is contained in the supplementary sections ( Data elements A to 
Data elements D) of this User guide. They can be accessed at: 
<www.lsay.edu.au/publications/2199.html>. 


These series of data element documents are identified by their major and sub-major topic 
area. An overview of these data element documents is given in table 5. 


Table 5 


User guide data element documents 




User guide 


Major topic area 


Sub-major topic area(s) 


Part A 


Demographics 


Student 

Parent 


Part B1 


Education 


School 






School transition 


Part B2 


Education 


Post-school 


PartC 


Employment 


Current 

Job history and training 
Seeking employment 
Not in the labour force 


Part D 


Social 


Health, living arrangements and finance 
General attitudes 



For each data element, the following information is provided (where applicable): 

• Data element — the data element name 

• Purpose — what information is provided by the data element 

• Variables — the variable names which correspond to this data element 
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• Variable type — whether the variable is in numeric or character format 

• Question number/label — the variable label; this includes the question number (where 
applicable) and a short description of the variable 

• Values — the possible values each variable can take and corresponding formats 

• Base population — the syntax for the number of respondents eligible to answer the 
corresponding question (note that base populations are currently only available for waves 
8 to 1 1 (2005-08). 

• Notes — other information. 

Variable listing/meta-data workbook 

To further assist in using the LSAY data, a meta-data workbook has been developed by 
NCVER. It provides a complete listing of the variables in the Y98 data set, as well as 
information about each variable. Data can be filtered and inspected by wave/year, 
questionnaire section, topic area(s) and/or data element. 

The meta-data spreadsheet can be accessed at: <www.lsay.edu.au/publications/2199.html>. 

The information contained in this workbook is similar to that contained in the topic maps and 
data elements documents, but is formatted differently. This format may be more suitable for 
some users. 

There are three worksheets included in the meta-data workbook: Variables, Values and Base. 
All three worksheets list each variable in the order it appears in the data set. Major, sub-major 
and minor topic areas as well as data elements are provided for each variable. The wave/year, 
questionnaire section and variable label are also included (where applicable). 

The first worksheet, Variables, includes information on the variable type (whether the 
variable is a numeric or character variable) and the variable label (which includes the question 
number and a brief description of the variable). 

The second worksheet, Values, lists each variable and the values that variable can take (where 
applicable). 

The third worksheet, Base, lists each variable and the syntax for the number of respondents 
eligible to answer the corresponding question. Note that base populations are currently only 
available for waves 8 to 1 1 (2005-08). 

To use the workbook, a similar approach, as outlined in the following section on ‘Variable 
selection’, would be to select and analyse variables/data elements. 

Note that, while all variables are included in the listing, variables provided in the data sets 
(available from the Australian Social Science Data Archive) are limited by existing data 
restrictions. See section, ‘Accessing the data: Data restrictions 1 . 

Variable selection 

Not all variables assigned to a data element are directly comparable. Additional attributes 
such as question wording, values, classifications used and base populations must therefore be 
considered when selecting variables and analysing the data. 

Data elements have been created to assist in grouping and thereby simplifying variable 
selection. They are unique within a minor topic area but may not be unique across topic areas. 
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For example, the data element, Study type, exists under the major and sub-major topic area 
Education: Post-school. This data element appears under two different minor topic areas: 
Study and Current study. The Study minor topic area may include both past and current study 
(depending on the questionnaire sequencing). When identifying a data element and/or variable 
for use, it is therefore important to consider other related data elements that may be located in 
a different topic area. 



The example above is illustrated in figure 5 using an excerpt from the meta-data spreadsheet. 

Figure 5 Identifying related topic areas 




To identify variables for analysis and support accurate variable selection, refer to the topic 

maps contained in ‘Topic maps’ section of this User guide. Flere relevant data elements can 

be identified by: 

• navigating to a major topic area of interest (for example, Education) 

• identifying a sub-major topic area of interest (for example, Post-school [education]) 

• identifying a minor topic area of interest (for example, Current study) 

• inspecting the data elements available within that minor topic area (for example, Month 
started study) 

• the number of times that data element appears within a wave is shown in the column 
corresponding to the particular wave. 

Before using and/or analysing the variables/data elements selected, it is important to consider: 

• variable attributes such as question wording, variable values, classifications used and base 
populations 

• data elements which appear more than once within a wave 

• data elements which appear more than once across waves (for longitudinal analysis) 

• data elements of the same name across other topic areas (if applicable) 

• other data elements that may be closely linked within a topic area or across other topic 

areas. 
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Variable-naming conventions 

Standard variables 

Most variable names are constructed using three pieces of information: the survey wave, the 
questionnaire section and the question number. 

A wave identifier is used to identify the survey wave. The first survey (or wave) is allocated 
an A, the second survey a B, up until wave 1 1, which is allocated a K. The section identifier is 
used to identify the section of the questionnaire. The question identifier is used to identify the 
question number. 



For example, the variable AD003 refers to: 

• Wave 1, denoted by the first character A 

• Section D, denoted by the second character D 

• Question 3, denoted by the last three characters 003 

Non-standard variables 

There are a series of other variables that do not fit within the standard variable-naming 
convention mentioned above. These variables are summarised in the table below. 

Table 6 Non-standard variables 



Non-standard variable 


Examples of non-standard 
variable names 


Description 


Demographics 


SEX 


Demographic variables, such as gender and Indigenous status, tend 




INDIG 


to be descriptive rather than have a naming convention 


School characteristics 


STATE 


School characteristics, such as state of the school and school sector, 




SCHTYP 


tend to be descriptive rather than have a naming convention 


Student achievement 


TOT.MATH 


Student achievement, such as maths scores and achievement 




AChLQU 


quartiles, tend to be descriptive rather than have a naming convention 

For further information on literacy and numeracy scoring see the 
section on Year 9 achievement in literacy and numeracy’ in this User 
Guide. 


Derived variables 


XLFS2006 


Derived variables have been constructed across all waves to 




XCEL1999 


summarise key information such as labour force status and current 
education level. 

For further information about derived variables see the section on 
‘Derived variables' in this User guide. 


IN flag 


INI 998 


IN flags have been created for each survey year to indicate whether a 




IN2006 


respondent participated in the survey in that year. If the value of the IN 
flag is equal to 1, this indicates the respondent participated in the 
survey for that year. 

IN flag variables are denoted by the two characters ‘IN' followed by 
four-digits for the survey year. 


Interview dates 


DINTOO 


Day of interview, month of interview, and year of interview are 




MINTOO 


collected each survey year and consolidated into an interview date 
variable. 




YINTOO 


Interview date variables are denoted by DINT for day of interview, 




INTDATOO 


MINT for month of interview, YINT for year of interview, and INTDAT 




INTSASOO 


for the consolidated interview date (in both character and SAS® date 
format) followed by two-digits for the survey year. 
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Non-standard variable 


Examples of non-standard 
variable names 


Description 


Sample items 


SAMP108 

SAMP208 


Sample items that look at information from previous years’ surveys 
have been created to enable more efficient and effective direction of 
questions. For example, the variable SAMP208 looks at whether the 
respondent had a job at the previous interview. Questions about 
whether respondents have the same job as reported at their last 
interview would only be asked of those who were recorded as being 
employed at the previous interview. 

Sample items are denoted by the four characters ‘SAMP’ followed by 
one digit denoting the sample item followed by two digits for the 
survey year. 


Weights 


WT06GEN 

ACH06WT 

WT2006 


Weight variables are denoted by the two characters ‘WT either at the 
beginning or end of the variable name. 

For further inf ormation about weights see section, ‘Sample and survey 
design— Weights’ of this User guide. 



Historical variable names 

From wave 1 (1995) to wave 7 (2004), a chronological variable-naming convention was used. 
These variable names did not reflect the survey year/wave, questionnaire section and/or 
question numbers within the questionnaires, but took the format VI, V2 . . . V4498. 

This approach to naming variables was superseded in wave 8 (2005) by the standard variable- 
naming convention described above. All old variable names were subsequently updated in the 
data set to reflect the new standard variable-naming convention. 

For this reason, the variable names in the existing technical documents do not correspond 
directly with the current data sets. Variable names can, however, be determined from the 
survey year/wave, questionnaire section and section number (see section, ‘Variable-naming 
conventions’). Alternatively, a variable concordance fde that maps the old to new variable 
names can be accessed at: <www.lsay.edu.au/publications/2199.html>. 
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Derived variables 

A series of derived variables have been developed to simplify use of the LSAY data and provide 
useful measures/indicators for analysis. The derived variables focus on the areas of educational 
attainment, employment, measures of engagement in study and work, and social indicators. 

The following table summarises the series of additional derived variables available on the 
Y98 data set. 

Derived variables are denoted by the character X, followed by several characters uniquely 
identifying the derived variable, then followed by four digits for the survey year. 

Detailed technical documentation outlining how the variables are derived as well as their 
properties is forthcoming and will be linked within this document when it becomes available. 



Table 7 Derived variables 



Indicators 


Derived variable 


Variable name 


Education 


Attending school 


XCSLYYYY 




Level of current study— study leading to a qualification 


XCELYYYY 




Study status in VET (incl. apprenticeship and traineeships) 


XVETYYYY 




Status of study in bachelor degree or higher 


XBACYYYY 




Mode of attendance 


XFTSYYYY 




Highest year of school completed 


XHSLYYYY 




Completed Year 12 or certificate level II or higher 


X122YYYY 




Completed Year 12 or certificate level III or higher 


X123YYYY 




Highest non-school qualification completed 


XHELYYYY 


Employment 


Labour force status at time of interview 


XLFSYYYY 




Full- or part-time status of main job 


XFTPYYYY 




Employment status 


XEMPYYYY 




Undertaking an apprentice or traineeship 


XATRYYYY 




Job mobility during last year 


XMOBYYYY 




ASCO 1 -digit occupation of respondent 


XOCCYYYY 




Average gross weekly pay for those in full-time employment 


XWKPYYYY 




Average hourly wage for all respondents 


XHRPYYYY 




Average weekly working hours 


XHRSYYYY 


Study and work 


Whether in full-time education or full-time employment 


XFTEYYYY 




Whether or not had any spell of unemployment during the year 


XUNEYYYY 


Social indicators 


Marital status 


XMARYYYY 




Living in parental home 


XATHYYYY 




Living in own home 


XOWNYYYY 




Number of dependent children 


XCHIYYYY 
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Sample and survey design 

The Y98 cohort is a nationally stratified sample of students who were in Year 9 in 1998. The 
major stratum considered in the design was the state of schooling in 1998. Students from 
small states were oversampled, and those from larger states were undersampled. The selection 
of students within states was proportional relative to school sector. Three school sectors were 
used as strata: government. Catholic and independent schools. 

The population information for the strata was drawn from the Schools Australia (ABS, 
cat. no. 4221.0) series. Within strata, schools were selected proportional to their size, and 
the information on the number of Y ear 9 students in each school came from a sampling 
frame derived by the Australian Council for Educational Research (based on information 
provided to them from state authorities and the former Department of Employment, 

Education and Training). 

An additional 500 Year 9 students were selected to form the pilot sample. 

In the first year of the survey, reading and numeracy tests were administered to students in 
their schools to provide information on school achievement. Students also completed a 
written background questionnaire about their educational and vocational plans and attitudes 
to school. 

One year later, these students were contacted and they completed another written 
questionnaire, which was followed by annual telephone interviews until 2008. Respondents 
who missed a survey wave were excluded from subsequent survey waves. 

Further information regarding survey design for the Y98 cohort can be found in LSAY 
technical paper no. 16, The designed and achieved sample of the 1998 LSAY sample, which 
can be accessed at: <www.lsay.edu.au/publications/1929.html>. 

Response rates 

The Y98 cohort initially surveyed 14 170 young people when they were in Year 9. In the 
second year of the survey, a paper-based questionnaire was used. This resulted in a higher 
than expected drop-out. Therefore in 1999, the survey was rebuilt and a computer-assisted 
telephone interview (CAT!) system was implemented. This resulted in an increase in sample 
size from 9289 to 9548. From 1999 onwards the CATI system has been used and the overall 
attrition rate is 8-13% per year. 

Table 8 shows the sample sizes and response rates for the Y98 cohort from 1998 to 2008. 



Table 8 Sample sizes and response rates 





1/1998 


2/1999 


3/2000 


4/2001 


Wave/year 

5/2002 6/2003 


1/2004 


8/2005 


9/2006 


10/2001 


11/2008 


Age at 30 Jun 


14.5 


15.5 


16.5 


17.5 


18.5 


19.5 


20.5 


21.5 


22.5 


23.5 


24.5 


Sample size (n) 


14117 


9 289 


9 548 


8 777 


7 762 


6 902 


5 979 


5 356 


4 729 


4210 


3 859 


% of wave 1 


100 


65.8 


67.6 


62.2 


55.0 


48.9 


42.4 


37.9 


33.5 


29.8 


27.3 


% of previous wave 




65.8 


102.8 


91.9 


88.4 


88.9 


86.6 


89.6 


88.3 


89.0 


91.7 
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Attrition 

Survey attrition is the phenomenon that occurs when not all respondents answer the survey in 
subsequent waves of interviewing. 

The data collection contractor works hard at achieving maximal response rates, but there is 
still drop-out between waves of the interviews. 

Survey attrition is an issue in the reporting of survey results if there are different groups of 
people dropping out at differing rates. Attrition can lead to biased population estimates with 
incorrect standard errors. In LSAY, survey attrition is counteracted by first trying to maximise 
the year-on-year response rate and, secondly, through the application of attrition weights. 

Weights 

In order for the LSAY sample to more accurately represent the population of Australian 
Year 9 students in 1998, the collected sample must be weighted to account for differences in 
sampling fractions and response rates among the population. 

There are two weighting procedures applied to the LSAY data: 

1. Sample weights : these reflect the original sample design and ensure that the sample 
matches the population from which the sample was drawn. In the case of LSAY, the 
sample weights sum to the sample size. For example, the sample weights add to 14 1 17 in 
wave 1, 9289 in wave 2 etc. The distribution of stratum levels (state and school sector) 
matches that of the original population. For example, students from states and territories 
with smaller numbers of Y ear 9 students are oversampled and students from states with 
larger numbers of Y ear 9 students are undersampled. In order for the sample to more 
accurately represent the population of Australian Y ear 9 students, the sample is weighted 
so that sample sizes within strata are proportional to the population sizes of the strata. 

2. Attrition weights', these account for most of the non-random respondent attrition. LSAY 
attrition weights are based on overall achievement quartiles and gender, and reweight to 
wave 1. 

The final LSAY weights for each wave combine sampling and attrition weights. Weighted 
data are presented in all cohort reports unless otherwise stated. 

Despite attempts to counteract attrition bias, users must be aware that survey drop-out may 
not be fully accounted for in the attrition weights for all sub-populations. To allow users to 
determine the effectiveness of the attrition weights, data in the cohort report demographic 
tables are presented both weighted and unweighted. 

Table 9 shows the three different types of available weights and the variable-naming 
convention for each, where YY and YYYY denote the survey year. 



Table 9 Weight variables 



Weight 


Variables 


Sample weight 


WTYYGEN 


Attrition weight 


ACHYYWT 


Final weight 


WTYYYY 
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Reliability of estimates 

The reliability of any estimates (for example, proportions, means, regression coefficients or 
variance parameters) must be considered. The greatest contributor to standard errors is the 
sample size. Small sample sizes result in high standard errors and wide confidence intervals. 
Users of the LSAY data must consider sample size when deriving or interpreting the data. 

Users are advised against relying on estimates obtained from sample sizes of < 5 or those 
estimates that have a relative standard error (RSE) of greater than 25%. 

In the LSAY cohort reports, estimates obtained from sample sizes of fewer than five 
respondents have been highlighted using double asterisks (that is, 5.0**), and estimates which 
have a relative standard error greater than 25% are indicated by a single asterisk (that is, 5.0*). 

Sources of error 

LSAY has two major types of error: non-sampling error and sampling error. 

Non-sampling error arises from processes not related to the selection of a sample from a 
population. Examples of non-sampling error include non-response, attrition, incorrect 
responses and interviewer and processing error. Elements of non-sampling error can be 
accounted for by using weighted estimates (for example, LSAY uses weights to adjust for 
attrition). Other elements that contribute to non-sampling error can be minimised through data 
checking and other protocols. Issues arising from non-sampling should be noted or addressed 
where relevant. There are no statistical measures to accurately record non-sampling error 
(apart from those related to attrition and non-response). 

Sampling error arises because estimates are obtained from the use of a sample rather than 
from measurement of the entire population. An estimate of interest will be subject to sample- 
to-sample variation. Sampling error is controlled by taking a large enough random sample 
from the population. Sample surveys are designed to control the size of the sampling error for 
key measurements. In random (probability) sampling, the estimate of sampling error is 
measured using the standard error. 

Standard errors 

The standard error of an estimate indicates the accuracy to which that estimate approximates 
the true population parameter. There are multiple methods for calculating the standard errors 
in complex surveys. One method that is commonly used is the Taylor series expansion. ' This 
technique has been applied to obtain estimates of standard errors for the LSAY cohort reports. 
These standard errors are then used to determine confidence intervals and relative standard 
errors. The three measures are all used to determine the reliability of the estimate of interest. 

In particular, the relative standard error enables a comparison of the accuracy between two 
estimates. 



3 For further information on this technique, users are encouraged to read WG Cochran, Sampling 
Techniques, 3 rd edition, JohnWiley and Sons, NewYork, 1977, sections 11.18, 11.91 and 11.20. 
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Confidence intervals 

The confidence interval is an interval estimate of the population parameter. Sample estimates 
which have high standard errors will have wide confidence intervals. 

The mathematical derivation of a 95% confidence interval for a proportion is: 

p ± 2 x se( p) 

where p is the estimate obtained from the sample, and se(p) is the standard error of the 
estimate. 

Relative standard error 

The relative standard error is derived by dividing the standard error of the estimate by the 
estimate itself, expressed as a percentage: 



RSE(p) = ^^-x 100 

P 

The relative standard error (RSE) is a standardised measure that enables the comparison of 
estimates in terms of their reliability. 

It is important that users take into consideration the reliability of estimates obtained. An 
estimate with a high relative standard error or wide confidence interval should be used with 
caution. This is particularly important when users are comparing two or more estimates. 

Examples 

Consider the following estimates of highest school level completed (XHSL2006) by 2006 in 
the Y98 cohort (obtained from the Y98 cohort reports for 2006). In this example, the 
estimation from the entire sample (large sample) is compared with the estimates obtained 
from Indigenous respondents (small sample). 



Table 10 Estimates, standard errors, RSEs and confidence limits for highest school level 
completed, Y98 cohort in 2006 for a large sample (all respondents) 









Standard 
error of % 




95% confidence interval 


Level 


Frequency 


% 


RSE (%) 


Lower limit 


Upper limit 


Year 12 


4030 


82.5 


0.7102 


0.861 


81.038 


83.878 


Year 11 


376 


8.7 


0.5317 


6.079 


7.683 


9.810 


Year 10 


295 


8.1 


0.5149 


6.382 


7.039 


9.098 


Year 9 or below 


28 


0.7 


0.1579 


21.713 


0.411 


1.043 


Total 


4729 


100.0 
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Table 11 Estimates, standard errors, RSEs and confidence limits for highest school level 
completed, Y98 cohort in 2006 for a small sample (Indigenous respondents) 



Level 


Frequency 


% 


Standard 
error of % 


RSE (%) 


95% confidence interval 
Lower limit Upper limit 


Year 12 


40 


58.1 


7.2177 


12.419 


43.684 


72.555 


Year 11 


19 


19.9* 


5.2478 


26.381 


9.397 


30.388 


Year 10 


11 


22.0* 


6.5965 


30.000 


8.795 


35.181 


Total 


70 


100.0 











* Estimate has a relative standard error greater than 25%. 



Using this example, we can see the estimate for Indigenous respondents who finished Year 10 
or below (22.0) has a relative standard error of 30.000 and is much less reliable than the 
estimate obtained using the whole sample (8.1) with a relative standard error of 6.382. 

Further, in this example, we would not recommend using any of the estimates obtained from 
the Indigenous respondents, with the exception of Year 12 completions. The interpretation of 
the confidence interval is such that there is a 95% chance that the true population estimate for 
Year 12 completion lies between 43.7 and 72.6%. 

In interpreting these results it should be noted that tables 1 0 and 1 1 also demonstrate the 
impact of different levels of attrition on the reliability of estimates. In particular, the relatively 
high level of attrition among Indigenous respondents means that the population in 2006 is 
particularly small, with correspondingly large relative standard errors. Sampling strategies for 
LSAY cohorts from 2003 onwards have attempted to address this by oversampling the 
Indigenous population. 
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Classifications and code frames 

There are a number of variables contained in the LSAY datasets that are coded using 
Australian Bureau of Statistics (ABS) classifications or other code frames (for example, 
institution). The information for these variables is collected using open-ended questions, and 
verbatim responses are recorded. These responses are then coded using the ABS classification 
structure (or other relevant code frame). 

The details of these classifications are not provided in the data elements documents because 
they are very lengthy and can be summarised in various ways. This section provides a 
summary of the classifications and code frames used for each survey wave and references the 
relevant classifications and code frames. 



Table 12 


Classifications 








Wave/year 


Education 


Occupation 


Industry 


Institution 


1/1998 


Not applicable 


ASCO edition 2 


Not applicable 


Not applicable 


2/1999 


Not applicable 


ASCO edition 2 


Not applicable 


Not applicable 


3/2000 


FOSCTEC 


ASCO edition 2 


ANZSIC 


Institution code frame 1 


4/2001 


FOSCTEC 


ASCO edition 2 


ANZSIC 


Institution code frame 1 


5/2002 


FOSCTEC 


ASCO edition 2 


ANZSIC 


Institution code frame 1 


6/2003 


ASCED 


ASCO edition 2 


ANZSIC 


Institution code frame 1 


7/2004 


ASCED 


ASCO edition 2 


ANZSIC 


Institution code frame 1 


8/2005 


ASCED 


ASCO edition 2 


ANZSIC 


Institution code frame 2 


9/2006 


ASCED 


ANZSCO 


ANZSIC (2006 revision) 


Institution code frame 2 


10/2007 


ASCED 


ANZSCO 


ANZSIC (2006 revision) 


Institution code frame 2 


11/2008 


ASCED 


ANZSCO 


ANZSIC (2006 revision) 


Institution code frame 2 



ASCO = Australian Standard Classification of Occupations; ANZSCO = Australian and New Zealand Standard Classification of 
Occupations; ANZSIC = Australian and New Zealand Standard Industrial Classification; FOSCTEC = Field of Study 
Classification of Tertiary Education Courses. 



Education 

For waves 1 and 2 of the Y98 cohort, no variable related to area of study was collected. 

Field of study classification of tertiary education courses (FOSCTEC) was used to code the 
area of study from waves 3 to 5 (2000 to 2002). This classification was officially superseded 
by the Australian Standard Classification of Education 4 (ASCED) in 2000. Flowever, this 
classification was only used from waves 6 to 1 1 (2003 to 2008). 

FOSCTEC classifications are no longer readily available on the ABS website. The ASCED 
classification does provide correspondence tables between FOSCTEC and ASCED 
classifications. The FOSCTEC classification can be accessed at: 
<www.lsay.edu.au/publications/2199.html>. 



4 ABS , Australian Standard Classification of Education (ASCED), cat. no. 1272.0, ABS, Canberra, 2001. 
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Occupation 

The Australian Standard Classification of Occupations 5 (ASCO) edition 2 was used to code 
occupations from waves 1 to 8 (1998 to 2005). From wave 9 (2006), the Australian and New 
Zealand Standard Classification of Occupations 6 (ANZSCO) was used. 

Industry 

The Australian and New Zealand Standard Industrial Classification 7 (ANZSIC) 1993 was 
used to code industries from waves 3 to 8 (2000 to 2005). From wave 9 (2006) ANZSIC 2006 
revision was used. 

Institution 

Institution code frames have been developed to enable consistent coding of education 
institutions. The first code frame uses four digits to code institutions from waves 3 to 7 (2000 
to 2004). 

The code frame was revised to incorporate information about the institution campus and uses 
six digits to code institutions (including campus) from wave 8 (2005). 

These institution code frames can be accessed at: <www.lsay.edu.au/publications/2199.html>. 



5 ABS , Australian Standard Classification of Occupations , 2 nd edn, cat. no. 1220.0, ABS, Canberra, 
1997. 

6 ABS, Australian and New Zealand Standard Classification of Occupations , 1 st edn, cat. no. 1220.0, 
ABS, Canberra, 2006. 

7 ABS, Australian and New Zealand Standard Industrial Classification, cat. no. 1292.0, ABS, 
Canberra, 1993. 
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Country of birth Country of birth: Verbatim 

Country of birth: All 
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Subjects: Maths 
Subjects: Overall 

Subjects: Humanities and social sciences 
Subjects: Economics and business 
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Received (state specific) score 

Result known 

Result 
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Provides information: Finance 

Accessed information: Apprenticeships and traineeships 

Accessed information: Careers 

Accessed information: TAFE 
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Reasons: Subjects/courses not available at school 
Reasons: Year 12 wouldn’t help get a job 
Reasons: Year 12 wouldn’t help with further study/training 
Reasons: Main reason 
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Year stopped study 

Apprenticeships/traineeships Still studying 

Confirmation of apprenticeship/traineeship 
Qualification type 
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Reasons: Deferred from study 
Reasons: Didn’t like course 
Reasons: Didn’t think course was worth doing 
Reasons: You got all you wanted from course 
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Reasons: You weren’t happy with the off-the-job training 
Reasons: You found the study or training too difficult 
Reasons: Because of problems with travelling or transport 
Reasons: Because of health or personal reasons 
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Sources of income: Paid work 
Sources of income: Parents or family 
Sources of income: Scholarship or cadetship 
Sources of income: Scholarship 
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Hours worked per week (present job) 

Hours worked per week (main job if more than one) 
Hours worked per week (all jobs if more than one) 
Hours worked per week (job reported at last interview) 
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Job satisfaction: Pay 

Job satisfaction: Opportunities for training 
Job satisfaction: Tasks assigned 
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Disability funding: Skin/allergies 

Disability funding: Breathing/asthma/bronchitis 

Disability funding: Heart/blood pressure 

Disability funding: Stomach/liver/kidney/digestive problems 
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